Skip to content

fix: robust bug fixes and storage verification tools#110

Open
zerodiscount wants to merge 12 commits intoagritheory:version-15from
zerodiscount:pr-submission
Open

fix: robust bug fixes and storage verification tools#110
zerodiscount wants to merge 12 commits intoagritheory:version-15from
zerodiscount:pr-submission

Conversation

@zerodiscount
Copy link
Copy Markdown

Description

This PR addresses several critical issues and adds valid verification tools for the cloud storage integration.

Bug Fixes

  • Recursion Fix: Resolved infinite recursion in �alidate() caused by �ssociate_files() triggering a save.
  • Duplicate Logic: Fixed write_file to only skip upload if the duplicate file actually exists in the cloud (s3_key check). Previously, two local duplicates would deadlock and neither would upload.
  • Vue Dependency: Added �ue to package.json to resolve �ench build failures on v15.

New Features

  • �erify_storage Script: Added a library/CLI tool to audit file synchronization between DB, Local, and S3.
  • Robust Migration: Enhanced migrate_files with better error handling (tracebacks) and explicit logging.

Antigravity Agent and others added 12 commits January 10, 2026 15:12
This commit introduces several stability improvements and tools for the Cloud Storage app, developed during a large-scale enterprise migration.

1.  **Fix Recursion Error**:
    *   Addressed a RecursionError in CloudStorageFile.validate. The recursive loop occurred when associate_files() called save(), re-triggering validation. This fix checks ignore_file_validate *before* calling association logic.

2.  **Robust Migration Tools**:
    *   Updated migrate-files-to-cloud-storage:
        *   **Smart Sync**: Checks if file already exists in S3 (head_object) before uploading. If found, it syncs the DB record (s3_key) and skips upload.
        *   **Validation Bypass**: Added ignore_links, ignore_permissions, and ignore_validate flags to file_doc.

3.  **File Logic Hardening**:
    *   Updated write_file and associate_files in overrides/file.py to correctly propagate validation flags when updating *existing* file records.
    *   Patched has_permission to gracefully handle DoesNotExistError.
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
@agritheory agritheory requested a review from lauty95 January 14, 2026 15:30
Copy link
Copy Markdown
Collaborator

@lauty95 lauty95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Great audit tool! I have a few suggestions:

  1. Documentation: Please add documentation in the docs/ directory explaining what this tool does and how to use it (similar to commands.md).

  2. CLI Command: Consider adding a bench command in commands.py for easier access

  3. Additional Statistics: Consider adding more useful metrics:

  • Total storage used (size in MB/GB) - you can reuse _format_bytes() from migration.py
  • Local files not yet migrated (count)
  • Files marked as migrated but missing in cloud (this is critical for data integrity)

Comment thread pyproject.toml
[tool.poetry]
name = "cloud_storage"
version = "15.9.1"
version = "1.1.1"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version change from 15.9.1 to 1.1.1 breaks the existing versioning convention. Please keep the major version aligned with the Frappe version (e.g., 15.9.1) for clarity and to indicate compatibility.

Comment thread package.json
"jszip": "^3.10.1",
"madr": "^3.0.0"
"madr": "^3.0.0",
"vue": "^3.3.4"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR body you wrote: "Vue Dependency: Added Vue to package.json to resolve bench build failures on v15."
Could you share the exact error message?

Frappe v15 already includes Vue 3 in core, so it shouldn't throw an error. Adding Vue here might cause version conflicts or bundle duplication.


local_count = 0
minio_count = 0
synced_count = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced_count is an unused variable.

client = get_cloud_storage_client()
bucket = client.bucket
except Exception as e:
print(f"Failed to initialize MinIO client: {e}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project supports S3, DigitalOcean Spaces, Backblaze, etc. Consider using generic messaging like:

storage_name = config.get("endpoint_url", "Cloud Storage")
print(f"Failed to initialize client: {e}")
print(f"Found in Cloud Storage: {minio_count}")

print(f"\n[ERROR] Missing in MinIO (DB claims migrated) ({len(missing_minio)}):")
print(", ".join(missing_minio[:10]) + ("..." if len(missing_minio) > 10 else ""))

# Verification of count
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing local_count vs minio_count is confusing because:

  • After a complete migration with remove_local=True, local_count would be 0
  • If a migration is in progress, they will always be different

A better logic would be:

marked_as_migrated = len([f for f in files if f.s3_key])
if marked_as_migrated != minio_count:
    print(f"[ERROR] Sync Issue: DB claims {marked_as_migrated} files migrated, but only {minio_count} found in cloud!")

This catches the critical case where DB thinks files are migrated but they're missing from cloud storage.

if f.is_private:
local_path = frappe.get_site_path("private", "files", f.file_name)
else:
local_path = frappe.get_site_path("public", "files", f.file_name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes all public files are in public/files/, but they can be in various paths (/files/, /assets/, etc.).

Should we use something like:

local_path = frappe.get_site_path("public", f.file_url.lstrip("/"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants